Skip to content

TraceDB: Snapshot-backed state for the trace baker#3360

Merged
Kbhat1 merged 11 commits into
mainfrom
pr/trace-snapshot-main
May 12, 2026
Merged

TraceDB: Snapshot-backed state for the trace baker#3360
Kbhat1 merged 11 commits into
mainfrom
pr/trace-snapshot-main

Conversation

@Kbhat1
Copy link
Copy Markdown
Contributor

@Kbhat1 Kbhat1 commented May 1, 2026

Describe your changes and provide context

  • Adds optional snapshot-backed trace baking so the baker can replay from in-memory memiavl state instead of SS-pebble.
  • Refcounts memiavl snapshots and releases trace leases through geth's existing StateReleaseFunc, avoiding GC finalizers.
  • Opt-in via trace_bake_use_snapshot; falls back when the backend cannot provide a snapshot.

Testing performed to validate your change

  • Verified on node
  • Unit tests
  • Long-running node on mainnet
  • go test ./sei-db/state_db/sc/memiavl -run 'TreeCopy|Snapshot' -count=1
Screen Shot 2026-05-07 at 12 59 13 PM

Note

Medium Risk
Medium risk because it changes how debug_trace* replays acquire/release historical state and adds new snapshot lifecycle management; while gated behind config and with fallbacks, bugs could cause leaks, stale reads, or trace failures under load.

Overview
Adds optional snapshot-backed trace baking: when trace_bake_use_snapshot is enabled and the node is using the storev2 root multi-store, EndBlock captures an in-memory SC snapshot and the trace baker/debug endpoints replay against that snapshot instead of SS-pebble.

Plumbs a new TraceContextProvider through evmrpc so debug tracing can obtain a context plus a release function, and implements SnapshotAwareRPCContextProvider to build contexts from leased snapshots (with consensus params populated) and fall back to the existing RPC context on misses/unsupported backends.

Extends the state-commit/memiavl layer to support Committer.Copy() snapshots and explicit ref-release, including refcounted memiavl snapshot mmaps and new tests to prevent use-after-unmap and verify eviction/lease semantics.

Reviewed by Cursor Bugbot for commit 4e744cf. Bugbot is set up for automated code reviews on this repo. Configure here.

@Kbhat1 Kbhat1 force-pushed the pr/trace-snapshot-main branch 4 times, most recently from 1764fc9 to c173794 Compare May 1, 2026 15:42
@Kbhat1 Kbhat1 changed the base branch from main to pr/trace-baker-main May 1, 2026 15:46
@Kbhat1 Kbhat1 changed the title Snapshot-backed state for the trace baker TraceDB: Snapshot-backed state for the trace baker May 1, 2026
@Kbhat1 Kbhat1 force-pushed the pr/trace-baker-main branch from 7b8a363 to af0de0f Compare May 1, 2026 15:59
@Kbhat1 Kbhat1 force-pushed the pr/trace-snapshot-main branch 2 times, most recently from 2412434 to e0a56bd Compare May 1, 2026 18:22
@Kbhat1 Kbhat1 force-pushed the pr/trace-baker-main branch from af0de0f to 6981a66 Compare May 1, 2026 19:05
@Kbhat1 Kbhat1 force-pushed the pr/trace-snapshot-main branch from e0a56bd to c5e3d21 Compare May 1, 2026 19:07
@Kbhat1 Kbhat1 force-pushed the pr/trace-baker-main branch from 6981a66 to fe9ec89 Compare May 1, 2026 21:02
@Kbhat1 Kbhat1 force-pushed the pr/trace-snapshot-main branch from c5e3d21 to 4bfc441 Compare May 1, 2026 21:04
@sei-protocol sei-protocol deleted a comment from github-actions Bot May 4, 2026
@Kbhat1 Kbhat1 force-pushed the pr/trace-baker-main branch from ae34e85 to f861506 Compare May 4, 2026 21:03
@Kbhat1 Kbhat1 force-pushed the pr/trace-snapshot-main branch from 4bfc441 to cf01744 Compare May 4, 2026 21:05
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 12, 2026, 2:50 PM

@Kbhat1 Kbhat1 force-pushed the pr/trace-baker-main branch from 7c210c4 to 27dc9b0 Compare May 4, 2026 21:40
@Kbhat1 Kbhat1 force-pushed the pr/trace-snapshot-main branch from cf01744 to c28c23f Compare May 4, 2026 21:41
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 67.28625% with 88 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.26%. Comparing base (09d0d60) to head (4e744cf).

Files with missing lines Patch % Lines
app/app.go 47.61% 16 Missing and 6 partials ⚠️
x/evm/keeper/trace_snapshot.go 78.12% 7 Missing and 7 partials ⚠️
sei-cosmos/storev2/rootmulti/store.go 55.17% 9 Missing and 4 partials ⚠️
evmrpc/simulate.go 75.00% 7 Missing and 4 partials ⚠️
evmrpc/config/config.go 0.00% 4 Missing and 2 partials ⚠️
sei-db/state_db/sc/composite/store.go 62.50% 3 Missing and 3 partials ⚠️
sei-db/state_db/sc/memiavl/db.go 55.55% 2 Missing and 2 partials ⚠️
sei-db/state_db/sc/memiavl/store.go 69.23% 2 Missing and 2 partials ⚠️
x/evm/keeper/abci.go 0.00% 2 Missing and 1 partial ⚠️
evmrpc/server.go 60.00% 1 Missing and 1 partial ⚠️
... and 2 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3360      +/-   ##
==========================================
- Coverage   59.26%   59.26%   -0.01%     
==========================================
  Files        2110     2111       +1     
  Lines      174242   174387     +145     
==========================================
+ Hits       103259   103343      +84     
- Misses      62053    62081      +28     
- Partials     8930     8963      +33     
Flag Coverage Δ
sei-chain-pr 65.71% <67.28%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
evmrpc/tracers.go 67.95% <ø> (ø)
sei-db/state_db/sc/memiavl/tree.go 88.20% <100.00%> (+0.12%) ⬆️
sei-db/state_db/sc/memiavl/snapshot.go 64.37% <96.87%> (+1.77%) ⬆️
evmrpc/server.go 87.79% <60.00%> (-0.67%) ⬇️
x/evm/keeper/keeper.go 54.83% <50.00%> (-0.06%) ⬇️
x/evm/keeper/abci.go 55.55% <0.00%> (-1.59%) ⬇️
sei-db/state_db/sc/memiavl/db.go 67.10% <55.55%> (+0.43%) ⬆️
sei-db/state_db/sc/memiavl/store.go 91.17% <69.23%> (-2.32%) ⬇️
evmrpc/config/config.go 70.43% <0.00%> (-3.88%) ⬇️
sei-db/state_db/sc/composite/store.go 73.36% <62.50%> (-0.77%) ⬇️
... and 4 more

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Base automatically changed from pr/trace-baker-main to main May 8, 2026 18:31
Kbhat1 and others added 5 commits May 8, 2026 17:17
Stacks on the trace baker PR. Captures an O(1) memiavl snapshot of
the SC tree at EndBlock and serves trace re-execution from in-RAM
state instead of SS-pebble.

memiavl: refcount *Snapshot. Tree.Copy() Acquires; Snapshot.Close
unmaps only on the final release. Without this a held copy was a
use-after-munmap waiting to happen — the background snapshot rewrite
calls Tree.ReplaceWith → snapshot.Close mid-flight, segfaulting any
held copy. The internal rewrite goroutine also drops its clone's
ref so the refcount can reach zero.

Committer interface gains Copy(). memiavl delegates to *DB.Copy.
composite returns nil when flatkv is engaged so the snapshot path
silently falls back.

storev2 rootmulti adds SnapshotSCStore + CacheMultiStoreFromCommitter.

EVM keeper: TraceSnapshotStore (bounded by-height map) and EndBlock
capture keyed by snapshot.Version() (= H-1 at EndBlock(H)).

App: SnapshotAwareRPCContextProvider builds the sdk.Context directly
from the snapshot CMS to skip the throwaway CacheMultiStoreWithVersion
that CreateQueryContext would otherwise make.

Configurable via [evm]:
  trace_bake_use_snapshot     (default false)
  trace_bake_snapshot_window  (default 64)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Kbhat1 and others added 3 commits May 8, 2026 17:17
Point to the existing memiavl MemNode gauges and trace-baker counters that
operators should watch when enabling the snapshot path on high-throughput
nodes. No new metrics — just signposts to ones that already exist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Kbhat1 Kbhat1 force-pushed the pr/trace-snapshot-main branch from 9bba87f to aacf715 Compare May 8, 2026 21:25
Comment thread evmrpc/simulate.go
Resolve conflict in sei-db/state_db/sc/types/types.go by keeping both
the Copy() addition from this branch and the Importer doc comment from
main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Kbhat1 Kbhat1 requested a review from blindchaser May 11, 2026 20:00
Comment thread evmrpc/server.go
homeDir string,
stateStore types.StateStore,
isPanicOrSyntheticTxFunc func(ctx context.Context, hash common.Hash) (bool, error), // used in *ExcludeTraceFail endpoints
traceCtxProviders ...TraceContextProvider,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only the first element is ever read. A non-variadic *TraceContextProvider parameter (or a small options struct) avoids the "what if someone passes two" ambiguity and keeps the signature self-documenting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping the variadic override for minimal PR scope since only one provider is intentionally supported

}

// Close releases all retained snapshots.
func (s *TraceSnapshotStore) Close() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TraceSnapshotStore.Close() returns nothing, but inside it ignores per-snapshot release errors:

x/evm/keeper/trace_snapshot.go:97-112

_ = releaser.ReleaseSnapshotRefs()

Same as the WARN above: refcount mismatches are real bugs. Either return error or log at WARN level on close to keep ops visibility.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, release/close errors are now logged at WARN instead of being swallowed

Comment thread x/evm/keeper/abci.go
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyEndBlocker)
// Bake height-1: at EndBlock(N) the indexer's safe latest is N-1, so
// N-1 is the most recent block guaranteed to be queryable.
// Bake height-1: at EndBlock(N) the indexer's safe latest is N-1. When
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EndBlock snapshot semantics are subtle — comment is dense, easy to mis-read

The off-by-one here is correct but non-obvious: storev2/rootmulti.flush() doesn't run until Commit(), so at EndBlock(N) the SC tree state is state_after_commit_of_(N-1) and snap.Version() == N-1. The baker then traces H=N-1, whose initializeBlock calls ctxProvider(H-1) = ctxProvider(N-2), which finds snap[N-2] Put at the previous EndBlock(N-1). Worth a one-line "lined up because rs.flush is called from Commit, not from EndBlock" in the comment to save the next reader a half-hour.

Also: initializeBlock calls the provider twice — once for prevBlockHeight (H-1) and once for blockNumber (H) (for WithNextMs). That means a single trace leases both snap[H-1] and snap[H]. As long as TraceBakeSnapshotWindow >= 2 this is fine, but if an operator misconfigures window=1 the second lease will miss and silently fall through to SS-pebble for oracle_mem/WithNextMs. Consider clamping window to >= 2 (or whatever the documented minimum is) at config-load time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the root cause by making snapshot window >= 2. Leaving the comment wording as is for now

Comment thread sei-db/state_db/sc/memiavl/db.go Outdated
Comment thread x/evm/keeper/trace_snapshot.go
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4e744cf. Configure here.

for _, snap := range toRelease {
releaseSnapshotRefs(snap)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TraceSnapshotStore.Close swallows errors silently on shutdown

Low Severity

TraceSnapshotStore.Close() returns nothing, so HandleClose in app.go cannot collect or propagate snapshot-release errors, unlike every other resource in that function which appends errors to errs. A refcount mismatch (an over-close indicating a real bug) would only appear as a WARN log line instead of surfacing through the standard error-return path.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4e744cf. Configure here.

@Kbhat1 Kbhat1 added this pull request to the merge queue May 12, 2026
Merged via the queue into main with commit 3816ec8 May 12, 2026
40 checks passed
@Kbhat1 Kbhat1 deleted the pr/trace-snapshot-main branch May 12, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants